Latent Dirichlet Bayesian Co-Clustering
نویسندگان
چکیده
Co-clustering has emerged as an important technique for mining contingency data matrices. However, almost all existing coclustering algorithms are hard partitioning, assigning each row and column of the data matrix to one cluster. Recently a Bayesian co-clustering approach has been proposed which allows a probability distribution membership in row and column clusters. The approach uses variational inference for parameter estimation. In this work, we modify the Bayesian co-clustering model, and use collapsed Gibbs sampling and collapsed variational inference for parameter estimation. Our empirical evaluation on real data sets shows that both collapsed Gibbs sampling and collapsed variational inference are able to find more accurate likelihood estimates than the standard variational Bayesian co-clustering approach.
منابع مشابه
Nonparametric Bayesian Methods for Relational Clustering
An important task in data mining is to identify natural clusters in data. Relational clustering [1], also known as co-clustering for dyadic data, uses information about related objects to help identify the cluster to which an object belongs. For example, words can be used to help cluster documents in which the words occur; conversely, documents can be used to help cluster the words occurring in...
متن کاملBayesian latent topic clustering model
Document modeling is important for document retrieval and categorization. The probabilistic latent semantic analysis (PLSA) and latent Dirichlet allocation (LDA) are popular paradigms of document models where word/document correlations are inferred by latent topics. In PLSA and LDA, the unseen words and documents are not explicitly represented at the same time. Model generalization is constrain...
متن کاملDiscipline Hotspots Mining Based on Hierarchical Dirichlet Topic Clustering and Co-word Network
Discovering inherent correlations and hot research topics among various disciplines from massive scientific documents is very important to understand the scientific research tendency. The LDA (Latent Dirichlet Allocation) topic model can find topics from big data sets, but the number of topics must to be told before topic clustering. There is a lot of randomness to determine the number of topic...
متن کاملHierarchical Latent Word Clustering
This paper presents a new Bayesian non-parametric model by extending the usage of Hierarchical Latent Dirichlet Allocation to extract tree structured word clusters from text data. The inference algorithm of the model collects words in a cluster if they share similar distribution over documents. In our experiments, we observed meaningful hierarchical structures on NIPS corpus and radiology repor...
متن کاملRobust Bayesian Max-Margin Clustering
We present max-margin Bayesian clustering (BMC), a general and robust framework that incorporates the max-margin criterion into Bayesian clustering models, as well as two concrete models of BMC to demonstrate its flexibility and effectiveness in dealing with different clustering tasks. The Dirichlet process max-margin Gaussian mixture is a nonparametric Bayesian clustering model that relaxes th...
متن کامل